AITopics | covering number

Collaborating Authors

covering number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

Colin Wei, Tengyu Ma

Neural Information Processing SystemsFeb-11-2026, 11:27:28 GMT

Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers. Our bounds scale polynomially in depth when these empirical quantities are small, as is usually the case in practice. To obtain these bounds, we develop general tools for augmenting a sequence of functions to make their composition Lipschitz and then covering the augmented functions. Inspired by our theory, we directly regularize the network's Jacobians during training and empirically demonstrate that this improves test performance.

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

66be31e4c40d676991f2405aaecc6934-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 02:58:55 GMT

dn 1, inequality, theorem 3, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
Asia > China > Hong Kong (0.05)
Asia > China > Hubei Province > Wuhan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

On Measuring Excess Capacity in Neural Networks

Neural Information Processing SystemsFeb-8-2026, 14:05:46 GMT

We study the excess capacity of deep networks in the context of supervised classification. That is, given a capacity measure of the underlying hypothesis class - in our case, empirical Rademacher complexity - to what extent can we (a priori) constrain this class while retaining an empirical error on a par with the unconstrained regime?

artificial intelligence, machine learning, rademacher complexity, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Salzburg > Salzburg (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

Colin Wei, Tengyu Ma

Neural Information Processing SystemsOct-2-2025, 02:11:02 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Supplementary Material for " Non-Asymptotic Error Bounds for Bidirectional GANs "

Neural Information Processing SystemsAug-14-2025, 22:11:51 GMT

Department of Mathematics, The Hong Kong University of Science and Technology Clear Water Bay, Hong Kong, China yyangdc@connect.ust.hk In this supplementary material, we first prove Theorem 3.2, and then Theorems 3.1 and 3.3. We use σ to denote the ReLU activation function in neural networks, which is σ (x) = max {x, 0}. We use notation O () and O () to express the order of function slightly differently, where O () omits the universal constant not relying on d while O () omits the constant related to d . So far, most of the related works assume that the target distribution µ is supported on a compact set, for example Chen et al. (2020) and Liang (2020).

dn 1, inequality, theorem 3, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.45)
North America > United States > Iowa > Johnson County > Iowa City (0.14)
Asia > China > Hubei Province > Wuhan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

420492060687ca7448398c4c3fa10366-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 10:51:11 GMT

constraint, lipschitz constant, rademacher complexity, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Salzburg > Salzburg (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces

Roknilamouki, Amirhossein, Ghosh, Arnob, Shi, Ming, Nourzad, Fatemeh, Ekici, Eylem, Shroff, Ness B.

arXiv.org Artificial IntelligenceFeb-25-2025

In Reinforcement Learning (RL), tasks with instantaneous hard constraints present significant challenges, particularly when the decision space is non-convex or non-star-convex. This issue is especially relevant in domains like autonomous vehicles and robotics, where constraints such as collision avoidance often take a non-convex form. In this paper, we establish a regret bound of $\tilde{\mathcal{O}}\bigl(\bigl(1 + \tfrac{1}{\tau}\bigr) \sqrt{\log(\tfrac{1}{\tau}) d^3 H^4 K} \bigr)$, applicable to both star-convex and non-star-convex cases, where $d$ is the feature dimension, $H$ the episode length, $K$ the number of episodes, and $\tau$ the safety threshold. Moreover, the violation of safety constraints is zero with high probability throughout the learning process. A key technical challenge in these settings is bounding the covering number of the value-function class, which is essential for achieving value-aware uniform concentration in model-free function approximation. For the star-convex setting, we develop a novel technique called Objective Constraint-Decomposition (OCD) to properly bound the covering number. This result also resolves an error in a previous work on constrained RL. In non-star-convex scenarios, where the covering number can become infinitely large, we propose a two-phase algorithm, Non-Convex Safe Least Squares Value Iteration (NCS-LSVI), which first reduces uncertainty about the safe set by playing a known safe policy. After that, it carefully balances exploration and exploitation to achieve the regret bound. Finally, numerical simulations on an autonomous driving scenario demonstrate the effectiveness of NCS-LSVI.

instantaneous safety constraint, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.18655

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.45)
Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.34)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

Xiao, Jiancong, Sun, Ruoyu, Long, Qi, Su, Weijie J.

arXiv.org Machine LearningJun-8-2024

Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfactory bound remains an elusive goal. Existing works on DNNs either apply to a surrogate loss instead of the robust loss or yield bounds that are notably looser compared to their standard counterparts. In the latter case, the bounds have a higher dependency on the width $m$ of the DNNs or the dimension $d$ of the data, with an extra factor of at least $\mathcal{O}(\sqrt{m})$ or $\mathcal{O}(\sqrt{d})$. This paper presents upper bounds for adversarial Rademacher complexity of DNNs that match the best-known upper bounds in standard settings, as established in the work of Bartlett et al. (2017), with the dependency on width and dimension being $\mathcal{O}(\ln(dm))$. The central challenge addressed is calculating the covering number of adversarial function classes. We aim to construct a new cover that possesses two properties: 1) compatibility with adversarial examples, and 2) precision comparable to covers used in standard settings. To this end, we introduce a new variant of covering number called the \emph{uniform covering number}, specifically designed and proven to reconcile these two properties. Consequently, our method effectively bridges the gap between Rademacher complexity in robust and standard generalization.

adversarial example, generalization, rademacher complexity, (13 more...)

arXiv.org Machine Learning

2406.05372

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Sequence Length Independent Norm-Based Generalization Bounds for Transformers

Trauger, Jacob, Tewari, Ambuj

arXiv.org Machine LearningOct-19-2023

Since Vaswani et al. (2017) debuted the Transformer, it has become one of the most preeminent architectures of its time. It has achieved state of the art prediction capabilities in various fields (Dosovitskiy et al., 2020; Wu et al., 2022; Vaswani et al., 2017; Pettersson and Falkman, 2023) and an implementation of it has even passed the BAR exam (Katz et al., 2023). With such widespread use, the theoretical underpinnings of this architecture are of great interest. Specifically, this paper is concerned with bounding the generalization error when using the Transformer in supervised learning. Upper bounding this can be used to help understand how sample size needs to scale with different architecture parameters and is a very common theoretical tool to understand machine learning algorithms (Kakade et al., 2008; Garg et al., 2020; Truong, 2022; Lin and Zhang, 2019).

artificial intelligence, machine learning, sequence length, (17 more...)

arXiv.org Machine Learning

2310.13088

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Approximate Description Length, Covering Numbers, and VC Dimension

Daniely, Amit, Katzhendler, Gal

arXiv.org Artificial IntelligenceSep-26-2022

Neural Networks are a widely used tool nowadays, despite the lack of theoretical background supporting their abilities to generalize well. Classical notions of learning guarantee generalization only if there are more examples that parameters. It is clear that a stronger assumption is needed to achieve tighter bounds, and indeed, different types of assumptions were used in order to fill this empirical-theoretical gap, including assumptions on robustness to noise [2], bias of the learning algorithm [5, 10], and norm bounds on the weight's matrices [8, 9] The idea of Approximate Description Length [4] was conceived as a part of the line of research working under assumptions that bound the magnitude of the network's weight matrices.

adl, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2209.12882

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.84)

Add feedback